Quantitative Methodology (UPF)
It s a generic name. It can be almost anything.
How Excel stores data in two dimensions:
A way1 to store data in R in two dimensions: rows and columns2:
# A tibble: 17,548 × 9
scode country year polity2 xrreg xrcomp xropen xconst parreg
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AFG Afghanistan 1800 -6 3 1 1 1 3
2 AFG Afghanistan 1801 -6 3 1 1 1 3
3 AFG Afghanistan 1802 -6 3 1 1 1 3
4 AFG Afghanistan 1803 -6 3 1 1 1 3
5 AFG Afghanistan 1804 -6 3 1 1 1 3
6 AFG Afghanistan 1805 -6 3 1 1 1 3
7 AFG Afghanistan 1806 -6 3 1 1 1 3
8 AFG Afghanistan 1807 -6 3 1 1 1 3
9 AFG Afghanistan 1808 -6 3 1 1 1 3
10 AFG Afghanistan 1809 -6 3 1 1 1 3
# … with 17,538 more rows
We consider that a dataframe is tidy if it fulfills the following requirements (Wickham 2014):
Load packages.
We need to decide which are the units of interest.
Ethnic Power Relations, International Conflict Research.
# A tibble: 14 × 5
countryname year groupname statusname groupsize
<chr> <dbl> <chr> <chr> <dbl>
1 Belgium 1967 Flemings JUNIOR PARTNER 0.59
2 Belgium 1967 Walloon SENIOR PARTNER 0.4
3 Belgium 1967 Germans IRRELEVANT 0.01
4 France 1967 French MONOPOLY 0.976
5 France 1967 Basques POWERLESS 0.013
6 France 1967 Corsicans POWERLESS 0.004
7 France 1967 Roma DISCRIMINATED 0.006
8 Belgium 1968 Flemings JUNIOR PARTNER 0.59
9 Belgium 1968 Walloon SENIOR PARTNER 0.4
10 Belgium 1968 Germans IRRELEVANT 0.01
11 France 1968 French MONOPOLY 0.976
12 France 1968 Basques POWERLESS 0.013
13 France 1968 Corsicans POWERLESS 0.004
14 France 1968 Roma DISCRIMINATED 0.006
# A tibble: 477 × 8
cowcode region year country no coup successful combat
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 40 5 1952 Cuba 1 1 1 1
2 40 5 1957 Cuba 1 1 0 1
3 41 5 1950 Haiti 1 1 1 0
4 41 5 1956 Haiti 1 1 0 0
5 41 5 1957 Haiti 1 1 1 0
6 41 5 1957 Haiti 2 1 1 0
7 41 5 1957 Haiti 3 1 1 0
8 41 5 1958 Haiti 1 1 0 1
9 41 5 1970 Haiti 1 1 0 0
10 41 5 1986 Haiti 1 1 1 0
# … with 467 more rows
When the UA and the UO are not the same, we run the risk of having an ecological fallacy problem.
Barcelona local elections: District level.
Barcelona local elections: Neighbourhood level.
Barcelona local elections: Census section level.
A characteristic of the object we’re studying.
# A tibble: 6 × 5
region municipality religion population suicide
<chr> <chr> <chr> <dbl> <dbl>
1 Isère Grenoble Protestant 8250 520
2 Isère Grenoble Catholic 1080 72
3 Isère Le Bourg-d'Oisans Protestant 325 12
4 Isère Le Bourg-d'Oisans Catholic 593 20
5 Isère Saint-Jean-de-Maurienne Protestant 181 5
6 Isère Saint-Jean-de-Maurienne Catholic 392 11
Unordered categories:
For strings, stringr (Wickham 2022) | Cheatsheet.
Storage: Character, factor
Operations:
==%in%!=Ordered categories:
For factors, forcats (Wickham 2021) | Cheatsheet.
Storage: Ordered factor
Operations:
==%in%!=>>=<<=Numbers, zero is arbitrary.
Storage: Numeric, integer, date
Operations:
==%in%!=>>=<<=+, -max(), min()Numbers, zero has meaning
Storage: Numeric
Operations:
==%in%!=>>=<<=+-*/sqrt(), log(), exp(), max(), min(), mean()…| Tipus | Característiques | Vector | Operacions |
|---|---|---|---|
| Categòrica nominal | Categories no ordenables | Caràcter o factor | ==, %in%, != |
| Categòrica ordinal | Categories ordenables | Factor | ==, %in%, !=, <=, <, >, >= |
| Numèrica d’interval | Nombres, zero sense significat | Numèric o enter | ==, !=, <=, <, >, >=, +, - |
| Numèrica de ràtio | Nombres, zero amb significat | Numèric | ==, !=, <=, <, >, >=, +, -, *, / … |
Quantitative Methodology (UPF)